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1. Introduction 


The biochemical models describing complex and dynamic metabolic systems are typically 
multi-parametric and non-linear, thus the identification of their parameters requires non- 
linear regression analysis of the experimental data. The stochastic nature of the experimental 
samples poses the necessity to estimate not only the values fitting best to the model, but also 
the distribution of the parameters, and to test statistical hypotheses about the values of these 
parameters. In such situations the application of analytical models for parameter 
distributions is totally inappropriate because their assumptions are not applicable for 
intrinsically non-linear regressions. That is why, Monte Carlo simulations are a powerful 
tool to model biochemical processes. The classification of Monte Carlo approaches is not 
unified, so here we comply with the interpretation given in (Press et al., 1992), where the 
general Monte Carlo approach is to construct parallel virtual worlds, in which the 
experimental estimates will play the role of true parameters, if the way in which the true 
parameters generate a sample is known. Bootstrap is a modification of Monte Carlo, which 
uses very few premises imposed on the data, and does not need to know the mechanism by 
which the true parameters generate experimental samples. Instead, resampling with 
replacement from the experimental sample is used to construct synthetic samples. 

As far as confidence intervals (CI) are concerned, literature offers multiple types, but each of 
them belongs to one of the two main groups: root (Politis, 1998) and percentile intervals 
(Efron & Tibshirani, 1993). The difference in the philosophy of those two CI types is 
substantial for the biochemical interpretation of results. The difference here is explained 
with the difference between classical statistics (where the parameters are fixed unknown 
quantities) and Bayesian statistics (where the parameters are random variables with 
unknown distributions), and also with the philosophical differences between objectivity and 
subjectivity of scientific research. The main conclusion is that root confidence intervals are 
confidence intervals of the investigated parameters, whereas percentile confidence intervals 
refer to the estimates of the investigated parameters. 
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Our first application of Monte Carlo and Bootstrap simulation procedures is with a 
simulation platform for training students in medical biochemistry (Tenekedjiev & Kolev, 
2002). In this system, students search for estimates and confidence intervals of parameters of 
a given biochemical system for different enzyme-substrate pairs. The platform applies 
Monte Carlo simulation on two stages. Initially, a Monte Carlo procedure is applied to 
emulate a biochemical experimental measurement setting along with given enzyme kinetic 
reactions as realistically as possible. The system is in position to simulate continuous 
enzyme assay (used for adjustment of the “experimental” conditions) and end-point enzyme 
assay “measurements” (suitable for parameter identification). We use an ordinary 
differential equation (ODE) as basis of the generation of pseudo-experimental data. The 
pseudo-real nature of the generated data is ensured by the random incorporation of three 
types of errors for each repetition of the experiments. The Briggs-Haldane steady-state 
model is fitted to the pseudo-measured and end-point assay data obtained by the system. 
The kinetic parameters can be calculated by y* -minimization. The task is simplified by the 
existence of a good initial guess from a linearized Lineweaver-Burk model. The two- 
dimensional root confidence regions of the parameters can be calculated by either Monte 
Carlo or Bootstrap, following similar procedures. The best point estimate is identified using 
trimmed mean over the flipped parameters taking only the values from the identified root 
confidence region. 

In the majority of biochemical reactions, parameters are unknown in very wide intervals, 
and may have different numerical order. Finding the root confidence regions (intervals) 
includes parameter flipping, which often generates results with an incorrect sign. That is 
why, in a second example (Tanka-Salamon et al., 2008) we propose a multiplicative 
modification for the estimation of root confidence regions and the best estimate of the 
parameters, which ensures that all estimates will have a physical meaning. The main 
assumption is that the ratio between the true parameter value and the optimal parameter 
value derived from the true data sample has the same distribution as the ratio between the 
optimal parameter value derived from the true data sample, and the optimal synthetic 
parameter value derived from the synthetic data sample. The assumption is equivalent to 
performing classical Bootstrap over the logarithms of the estimated parameters. This 
method is applied in a real experimental set-up for the estimation of root confidence regions 
of kinetic constants and root best estimates in amidolytic activity of plasmin under the 
influence of three fatty acids. By doing so, the inhibition effect of the three fatty acids can be 
proven and quantified. The measured data have the form of continuous reaction progress 
curves with several replicas. The product concentrations are predicted by three different 
models with increasing complexity. We model the instability of the inhibited enzyme and 
represent the resulting continuous assay model with concomitant inactivation of the enzyme 
as a system of two stiff ODE. From there, we derive the closed form of the progress curve. 
The four-dimensional root confidence regions are acquired by Monte Carlo simulation in 
every data point in each of the progress curves using an analytical model of the measured 
standard deviation, similarly to the first example. 


2. Monte Carlo and Bootstrap confidence regions of parameters 


Statistical simulation methods are a powerful tool in the analysis of complex systems. The 
most popular among them is Monte Carlo. The numerical techniques that stand behind this 
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method are based on statistical simulation, i.e. on any method that uses random number 
sequences to conduct a simulation. The essence of the method is that it provides integral 
measures of uncertainty of the simulated system based on the known uncertainties of its 
parts (Hertz & Thomas, 1983). The integral measures are calculated on the basis of a large 
number of system instances in different pseudo-realities, each defined by a specific set of 
randomly generated states of its parts. The Monte Carlo approach is successfully employed 
in finding estimates of parameters that define the behavior of different stochastic systems. 
The flexibility, the very few premises imposed on the data and the application in hypothesis 
tests and statistical parameter assessment are what makes simulation techniques widely 
accepted. 

Following the interpretation of (Press et al., 1992), one can assume that there is an 
experiment that intends to assess a certain set of M parameters that define the behavior of a 
measurable stochastic system. The true values 4,,,,, of those parameters are unknown to the 
observer, but they are statistically realized in a set of real measurements Dp available to the 
observer (called a learning sample), which incorporates some random error. Then the 
experimenter can assess the parameters of a given model so that the discrepancy between 
the modeled and the measured data is minimized (e.g. using y? -minimization or some 
other method) and a set of real parameters ñg is formed. Due to the random character of the 
sample generating process, repetitions of the experimental measurement would generate 
many other possible measurement sets - Dı, Dz Ds, ... - with identical structure as Do. Those 
in turn would generate sets of real parameters respectively 4,,4,43, ... that slightly differ 
from each other. So the estimate a is just an instance of the M-dimensional random 
variable å. 

The task is to find the distribution of the deviation of the real parameters from the true ones, 
when just Do is available. This random variable is called a root (Beran, 1986; DiGiccio & 
Romano, 1988). As long as å ue is not known, the general approach is to create a fictitious 
world, where the true parameters are substituted with the real ones. The main assumption is 
that the root in the real world has the same distribution as in the fictitious world. 

If we know the process that generates data under a given set of parameters dy, then we can 
simulate synthetic data sets D D ny D6. with the same structure as the learning 
sample. The Bootstrap method (Efron & Tibshirani, 1993) applies to cases where the process 
that generates the data and/or the nature of the measurement error are unknown, and the 
learning sample Do is formed out of N independent and identically distributed (iid) 
measurements. The Bootstrap generates synthetic samples DÎ, D3, ..., D6 with the same 
structure as Do by drawing with replacement from Dp. In (Press et al., 1992) this type of 
Bootstrap is called quick-and-dirty Monte Carlo. There is another Bootstrap version, called 
exact Bootstrap, which forms all synthetic samples that can be generated by drawing with 
replacement from Do. However, this method is rather impractical in a real problem so we 
shall not stress it here. 

The parameters a; , @3, ..., ig can be identified from the synthetic samples in exactly the 
same way as fy was identified from Do, which in turn would generate instances af = a; — Ag 
(q=1, 2, ..., Q) of the root 7/ in the fictitious world. If sufficiently large number of such 
instances Q is created, then it is possible to construct the empirical distribution of the root 
7! in the fictitious world, which according to the main assumption coincides with the 


distribution of the root F =a-4,,,,,. 
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The ultimate step in the simulation is to assess confidence intervals or confidence regions of 
the investigated M-dimensional parameter. There are different methods to assess one- 
dimensional CIs (e.g. percentile-t, bootstrap-t, bias-correction, simple bootstrap interval, etc.; 
see (Davison & Hinkley, 1997; MacKinnon & Smith, 1998)), yet for multi-dimensional Cls 
practically there are only two methods - root and percentile methods. The distinction of 
those two stems from the different approaches to probabilities in general. 

If the frequentist definition of probabilities is adopted (Von Mises, 1946), then the likelihood 
inference has to be applied in parameter identification (Berger & Wolpert, 1988). Here the 
parameters 4j,,,. are considered unknown, but deterministic values. In that sense, the 
estimate ñ is the only value, which is random, so its confidence region can be calculated. 


x true 


Since the distribution of 7/ and 7 coincide, then the following is true for the random 


x true 


variable a: Ā=ñpe +P" =4,,,,+7/. Then, the instances a, of a may be generated by 
replacing 4,,,. with dy as the only available estimate: a; = ay + A = ip 4 äs Ay = ai; . The so- 
called percentile confidence interval (or region) is identified from the instances of the synthetic 
parameters a; and it is in fact the confidence region of the estimate. 

If the subjective definition of probabilities is adopted (Jeffrey, 2004), then Bayesian statistics 
can be used in the parameter identification (Berry, 1996). Here the parameters fue are 
considered to be random variables, whose distribution can be assessed, and its confidence 


a true 


region may be calculated. Since the distributions of 7/ and 7 coincide, then the 


following is true for the random variable @: G,,,,=a-7'"’ =a—7/. Then, the instances 


Of Gj. may be generated by replacing a@ with fy as the only available estimate: 
=i) -7j =ñ (a; fig ) = 2ā0 ñ. The so-called root confidence interval (or region) is 
identified from the instances of the flipped (around the real parameters) synthetic 
parameters ao =2fy -ā; and it is in fact the confidence region of the true parameter. 
Furthermore, using this approach it is possible to find a point estimate pest Of G,. that is 
better than äg. One possibility is to find äs as the mean value of the flipped synthetic 
parameters ai J’, Since the method is sensitive to errors (Davidson & MacKinnon, 1999), it is 
better to use trimmed mean value (Hanke & Reitsch, 1991). Regardless of the type of mean 


value, the resulting point estimate 4,,., is unbiased unlike ay. 


rye, 


Atrue,q 


3. Generation and processing of data in enzyme kinetics 


Metabolomics (Strohman, 2002) deals with the evaluation of the dynamic metabolic 
networks and links the genetic information to the phenotype through adequate metabolic 
control analysis. It is a prerequisite in the understanding of the cellular phenotype and its 
pathological alterations. For that purpose, one not only requires the technical developments 
that allow stringent monitoring of the metabolic fluctuations induced in vivo by biological 
signals and environmental changes, but also powerful mathematical models capable of 
treating dynamic metabolic systems in their variability. The principles of metabolomics are 
well known in biochemistry (Newsholme & Leech, 1984; Voet & Voet, 1995), but the training 
of biomedical and clinical researchers is still insufficient to exploit the opportunities 
provided by the up-to-date computer-intensive statistical procedures applicable in the field 
of enzyme kinetics. For that purpose, our earlier work (Tenekedjiev & Kolev, 2002) 
introduces a computer-simulated experimental setting, in which the user (a graduate 
medical student, who is familiar with the basic ideas in enzyme kinetics and the structure of 
metabolic pathways) acquires skills in adjusting experimental conditions to conform model 
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assumptions, in identification of kinetic parameters and determination of their confidence 
intervals, in application of these parameters for metabolic predictions in context-dependent 
in vivo situations. In the proposed system, students search for estimates and confidence 


kı +k 
-= of an enzyme-catalyzed reaction 


1 


intervals of the parameters kp and Ky = 


k k 
E+S¢—— ES—* >E+P (3.1) 
-1 
for different enzyme-substrate pairs, where the substrate degrades into product under the 
influence of the enzyme. The enzyme and substrate bind reversibly to form enzyme- 
substrate complex, which in turn irreversibly transforms into product. 
The proposed system applies Monte Carlo simulations at two levels. 


3.1 Simulation of the experimental equipment data generation 

Initially, the system uses Monte Carlo simulations to emulate enzyme kinetic reactions as 
close as possible to the real lab setting. The user can perform continuous enzyme assay and 
end-point assay “measurements”. 

The continuous enzyme assay simulates an expensive experiment, where the product 
concentration is measured at equally spaced time intervals (sampling time), and the mean 
reaction rate is calculated for different conditions. Here it is not envisaged to make 
repetitions and perform identification of the parameters. The user sets up the enzyme- 
substrate pair, the designed concentrations of the total enzyme E% (the free enzyme 
concentration plus the enzyme-substrate complex concentration) and the initial substrate 
ge , the temperature T, the pH, the overall experimental time tver and the sampling time At 
(see Fig. 3.1). As a result, the user gets the time course of the pseudo-measured product 
Pmesi, At), for i=1, 2, ..., fov"/ At. The time course of the substrate, the product, the free 
enzyme and the enzyme-substrate complex are recalculated from Pes (see Fig. 3.2). These 
results form a single replica of the process. 


E] conditions of the continuous assay for the study of effects on reaction rate aa ees) 


reaction E2-B ; in vivo concentrations: total enzyme- 0.0003(mM) substrate- 0.024(mM) 
to study the time-course of the concentrations of E, ES, S and P 
fill in the conditions for the continuous assay: 


overall experiment 
time in (min) 


temperature 
in (degrees C) 25 


pH 
total enzyme 
concentration in (mw) 0.00034 


initial substrate A 
concentration in (my 0-24 


‘sampling time 
in (sec) 


ox | Cancel_| Help | 


Fig. 3.1. Setting up the designed concentrations of the total enzyme E" and the initial 
substrate S{*, the temperature T, the acidity pH, the overall experimental time f°” and the 
sampling time At for a given enzyme-substrate pair 
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time(min) 
Fig. 3.2. Time course of the substrate, the product, the free enzyme and the enzyme- 
substrate complex from the setting in Fig. 3.1 


For each replica we adopted the Briggs-Haldane steady-state approach (Segel, 1993) to 
model the transformation of the substrate into product in the biochemical system (3.1) using 
the ODE: 


ape ae (t, T, pH) p” { so = pirue ) 


dt Km,app (pH) + oe _ pe 


; (3.2) 


In (3.2), the true concentration of the product P* is a function of the time t; E“? is the true 
concentration of the total enzyme; Sj“ is the true concentration of the substrate; kpapp and 
Kapp are the apparent constants kp and Ky in this replica. The initial condition of (3.2) is that 
the true concentration of the product is zero at the beginning: P*e(0)=0. 

To model the biological diversity of the substrate-enzyme pair in the real experiment, kp,app 
and Kuapp are instances of random variables. The apparent constant Km,pp depends also on 


pH, whereas kp app depends on pH, T, and t: 


í + DE + 10k, Jfa p Te 10" 
ae : 


10°" 10" Ki 
Ku app A 1 a R, (3.3) 
H +10?" Kes, 
10”? Kes, 
A oe 16 
i it 10 FoR (3.4) 
papp 1 H >p i 
+——,—— +10" Kes, 
10?" .Kes, 
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A3.(T-T,)*.Rp if T>Ty 
0 EST, 


where, B -| (3.5) 


In (3.3), (3.4) and (3.5), Rp, Rm and Rg are instances of a positive continuous uniformly 
distributed random variable on the interval [1- Apar ; 1+ Apar ], where Apa <1 is given. The 
constants Ty, A1, Az A3, g, Key, Kez, Kes1, Kesz, Kai, Kaz are known typical constants for each 
enzyme-substrate pair, with the following meaning: T4 - the temperature over which the 
enzyme degradation begins (in degrees C); Ai - the logscale factor of the apparent kp 
constants at t=0 (in 1/min); A2 - the heat acceleration factor of the apparent kp constants at 
t=0 (in 1/min); A; - the scale factor of the temporal enzyme degradation constant B; g - the 
power in the temporal enzyme degradation constant B; Ke: - the pK value for the first H+ 
dissociation constant of the free enzyme; Kez - pK value for the second H+ dissociation 
constant of the free enzyme; Kes; - the pK value for the first H+ dissociation constant of the 
enzyme-substrate complex; Kes2 - the pK value for the second H+ dissociation constant of 
the enzyme-substrate complex; Ka; - the pK value of the first acidic dissociation group in the 
diprotic substrate; Kaz - the pK value of the second acidic dissociation group in the diprotic 
substrate. 

To model the imperfection of the setup in the real experiment, Sj“ and Ej" are instances 
of random variables, which slightly deviate from the designed S° and Ei* : 


pie = Ries Re (3.6) 
Si 2 oe Rs (3.7) 


In (3.6) and (3.7), Re and Rs are instances of positive continuous normally distributed 
random variables with unit mean values and standard deviations respectively c.E/* +d, 
and c.S{’ +d, where c and d are given. 

To model the measurement error in the real experiment, each of the pseudo-measured 
values of product concentration Pmes(i, At ), for i=1, 2, ..., tover/ At is an instance of a random 
variable, which slightly deviates from the true product concentration Pive(i. At ), for i=1, 2, 
..., bover/ At that results from integrating the ODE (3.2) from 0 to tover; 


Pmes(i, At )= Ptre(i, At ).Rpj, for i=1, 2, ..., torf At (3.8) 


In (3.8), Rp; are instances of positive continuous normally distributed random variables with 
unit mean values and standard deviations b.(P!e(i. At))2, where a and b are given. 

After finding Ps it is possible to find the measured time course of the substrate, of the 
enzyme-substrate complex, and of the free enzyme: 


S” (At) = Sy’ — P™* (i.At) 
Ey S" (iAt) 
Km,app + S” GA) 

BM (1At) = EP" —ES™ (i. At) 


ESM (1At) = orm Pere 2) 


The quantities in (3.8) and (3.9) are shown on Fig. 3.2. The measured velocity of the process 
can be approximated with the formula: 
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Vmes = Pimes(fover) /pover (3 .10) 


The purpose of the end-point assay experiment is to teach the user how to manually setup 
“optimal” experimental conditions. First of all, the steady-state assumptions have to be 
checked (that the concentration of the enzyme-substrate complex should stay approximately 
the same). For example, Fig. 3.2 shows that this condition is not met under the selected 
experimental conditions. An experiment that meets the requirements of the steady-state is 
shown on Fig. 3.3. Students can visually check the validity of the empirical "criteria" for the 
steady-state model (the degraded substrate should be up to 10% of the initial one, and the 
total enzyme concentration should be far less than the concentration of the initial substrate 
plus the constant Kapp), which otherwise should be proven with complex mathematical 
procedures (Segel, 1988). The influence of temperature, of acidity, of the substrate and 
enzyme concentrations, and the overall time of the experiment can be estimated by 
constructing the velocity graphics of the reaction as a function of the investigated condition. 
For example, Fig. 3.4 shows the influence of temperature on the product formation. The first 
graphics shows a continuous monitoring of the product generation in the course of the 
enzyme-catalysed reaction at various temperatures. The continuous assay illustrates also the 
intrinsic errors of the sampling procedure, which impose the necessity for repeated 
sampling. The second graph shows the end-point enzyme activity assay for various 
sampling time. The data are cross-section of the continuous assay. for 1- and 10-min 
incubation. Comparison of the two curves illustrates the apparent nature of the "optimal" 
temperature caused by the time-dependence of the heat denaturation. 


— product 
— substrate 


1.5 


= 


P(mM), S(mM) 
= 
on 


0 01 02 03 04 05 06 OF 08 O89 1 
time(min) 


Fig. 3.3. Adjustment of the steady-state experimental conditions 


As a whole, the instrumental model for a continuous-time assay is used as a manual Monte 
Carlo system to find suitable experimental conditions, appropriate for the end-point assay 
experiment, where the values of the kinetic constants and their confidence regions are 
identified. The end-point assay simulates a cheap experiment, where the user sets up T, pH, 
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tver, E! , J predetermined initial substrate concentrations Sra . ene , and the number 
K of replicas for any substrate concentration. The product concentration is measured K times 


des su 


just at time ter for each So); “(for j=1, 2, ..., J). 


0.9 


— T-20.0 Q 
0.8 H — T=25.0 CO 
— T-30.0 fQ 
— T-35.0 CO 
— T=40.0 Q 
— T=45.0 PC) 
— T-50.0 0 
— T=55.0 PO 
— T-60.0 °C 


— 165.0 CO 
— T=70.0 CO AL 


product(mM) 


time(min) 


— thax 10 (min) 


velocity(mMimin) 


` 20 25 30 35 40 45 50 55 60 65 70 
TCC) 


Fig. 3.4. General conditions for optimal enzyme action 


3.2 Kinetic parameter identification with Monte Carlo or Bootstrap simulation 
Each replica is simulated in the same way as in the continuous-time assay, but the learning 
sample Do only consists of the product concentrations at time toer 


Do =} re i =1,2,...,J;k SAPs where P% is the k-th measurement of the product 


concentration Pes at substrate concentration Son . As long as all the designed experimental 


mes 


conditions are identical for Pj," (k=1, 2, ..., K), they can be referred as a process. Let 


and 


pines:mean pines,std 
J J 


are the mean value and the standard deviation for the end-point 
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product concentration of the jt process, calculated by the K instances P% (k=1, 2, ..., K). A 


non-linear regression model of the standard deviation of the ae final product 
concentration is created as a function of the mean value of the measured final product 
concentration: 


mes mean 
(r +a 


prod,std _ pmesmean 4°18 for j=1, 2.. ] 3.11 
j j J 


The constants a; and m are determined with 7 -minimization of: 


(m,a) => (Ig(Pm*"") —(a, +1)-1g(Pm"") -a i (3.12) 


M 


j=1 


The final product concentrations are predicted by a model different from (3.2), which takes 
the form of J number of ODEs, which can be solved separately: 


d des des mod 
dpr -k Ep (S03 — Pr | 


dt Ky + Soy - pee 


, for j=1, 2, ..., J (3.13) 


The initial conditions of (3.13) are iad (0) =0. In (3.13), kp and Ky are the kinetic constants. 
After integrating the ODE (3.13) from time 0 to fever, the value at tver would depend on the 
values of kp and Ky. At the same time, P"°""™" and pee “4 would Only depend on the 
sample Do. Then the optimal parameters kp, and Kmo can be found with 7? -minimization of 


2 


J pred (+ (t wes Ky)- pimes,mean (Do) 
j over j 
0° (ky/Km,Dp) = 3 pr) (3.14) 
ño = (k,o-Kmo) = a pi 2 (bp KPa) (3.15) 
pr sm 


Solving (3.15) is simplified by the presence of a good initial guess from a linearized model of 
Lineweaver-Burk (Lineweaver & Burk, 1934). 

The two-dimensional confidence region of k, and Ky is again calculated by Monte Carlo or 
Bootstrap simulation. The synthetic samples in the Monte Carlo simulation D (q=1,2,..., Q) 
are formed so that for the j-th process, the final product concentrations are generated as K 
number of instances of a normally distributed random variable with mean value Pj""™" 
and standard deviation Pee ‘4 The synthetic samples in the Bootstrap simulation D 
(q=1,2,..., Q) are formed sa that for the j-th process, the final product concentrations aré 
senaid by K drawing with replacement from the set { PEPIS e Paa . Whatever the 
method for generating D y 


6 = (Ky Rica) =a min Xx (k, „Km P :)) (3.16) 


pa’Kpa 
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In order to find the root confidence interval, the resulting parameters are flipped in 
accordance with the discussion in section 2: 


-Sf 9% _ 3S s 
íj” = 2G) —4, =(2.kp,0 he m paT 


4” 


2.Ku,o King) = (kd Ki) (3.17) 


Let 725 = 7? ( Kes /Kig Do) be the discrepancy measure (3.14), calculated over the original 


sample with the flipped synthetic parameters. Let’s renumerate the acquired discrepancy 


measures y*° in ascending order so that y?® < 73” <...< 7@° . Then the first Q*p sorted 


vectors belong to the two-dimensional (2-D) confidence region with probability p. These 
vectors are called inside simulated points, and the rest are the outside simulated points. The 
area of the inside simulated points determines the root confidence region and its borders 
have constant x? discrepancy measure. The projections of the confidence region on the Km 
and kp axes are the 2-D confidence intervals (Fig. 3.5.). The best point estimate of the 


parameters is calculated as trimmed mean of ai J using just the inside simulated points. 


The percentile confidence region can be calculated likewise, but instead of the flipped 
synthetic parameters, one should use just the synthetic parameters. 


inside simulated points 
outside simulated points 
best guess 

estimated from the sample 


—— 1-D confidence intervals 


0.07 0.08 0.09 0.1 0.11 0.12 0.13 


Kn — (mM) 


Fig. 3.5. Confidence area of the model parameters based on a Monte Carlo simulation 
procedure. 


The described simulation system has been employed for over 8 years to train medical 
students in interpretation of the stochastic nature of experimentally estimated model- 
parameters at the Department of Medical Biochemistry of the Semmelweis University in 
Budapest (Hungary). After this training, students (and the users in general) are able to 
perceive the imperative requirements for multiple sampling replicas in experimentation, to 
interpret the stochastic nature of experimentally estimated model-parameters, as well as to 
gain insights into the application of in vitro determined kinetic parameters for the modeling 
of in vivo metabolic events. 
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4. Influence of fatty acids on the amidolytic activity of plasmin 


The dissolution of intravascular thrombi is performed through the hydrolytic degradation of 
their fibrin matrix catalyzed by the serine protease plasmin (Kolev et al., 2005). Arterial 
thrombi enclose millimolar concentrations of phospholipids (Varadi et al., 2004) and free 
fatty acids (Rabai et al., 2007). These lipid constituents of thrombi are reported to modulate 
the fibrinolytic process (Varadi et al., 2004, Rabai et al., 2007; Hazari et al., 1992; Hazari et al., 
1994; Huet et al., 2004). The paper (Tanka-Salamon et al., 2008) investigates the effects of the 
three most abundant fatty acids in the structure of platelet membranes - arachidonic acid, 
stearic acid and oleic acid representing respectively 22.0, 19.5 and 18.8 % of the total fatty 
acid content of platelet phosphoglycerolipids (Rabai et al., 2007). 

Plasmin (E,=20 nM) was incubated with sodium salts of fatty acids for 15 min at 37 °C. Then 
180 pl of this mixture was added to 20 yl Spectrozyme-PL (H-D-norleucyl- 
hexahydrotyrosyl-lysine-p-nitroanilide, American Diagnostica, Stamford, CT) at 7 different 
concentrations in the range 0.05 - 6 mM yielding final concentration Soj (j=1,2,...,7) in the 
volume of the reaction mixtures. The light absorbance at 405 nm (A405), which reflects the 
release of p-nitraniline, was measured continuously at t; (i=1,2,...,60) time points in the 
course of 10 min at 37 °C, 4 parallel measurements were done for each So. The main 
problem in the initial data processing is the proper assessment of the baseline absorbance 
and the delay time of the measurement, which affect profoundly the absolute values of the 
pNA product. An original algorithm was employed to convert the measured absorbance 
into product concentration Prk (the notation indicates the product concentration at time t; 


for the k-th replica with Soj) which form the learning sample Do: 
Do ={ ne Ji =1,2,...,60; j =1,2,...,7;k = 1,2,3,4} . As long as all the designed experimental 
conditions are identical for Pre (k=1, 2, 3, 4), they can be referred as a process. Let 


pits thear 
ij 


mes,std 
and P; 


are the mean value and the standard deviation for the product 


concentration of the jth process at time ti, calculated by four instances P7} (k=1, 2, 3, 4). A 
non-linear regression model of the standard deviation of the measured product 
concentration for each process is created as a function of the mean value of the measured 


product concentration: 


Bee p P forget T (4.1) 


The constants a; and b; are determined with ° -minimization of: 


(e(Rr)-9)48( Re") —Ie(0,)) (4.2) 


The product concentrations are predicted by three different models with increasing 
d ‘ k ; 
complexity. In the simplest case (Model I) the scheme E + S<-*ES—” ->E +P is assumed, 
ka 
where E is plasmin, S is Spectrozyme-PL, P is p-nitroaniline, kı, k2 and k1 are the respective 
reaction rate constants. With the quasi-steady-state assumption the differential rate equation 
for this scheme is 
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mod mod 
dpm! ky Exo-(5o,;- Pr) 
dt Km + So, = pred 


, for j=1,2, ...,7 (4.3) 


k_ +k, is 
k 

the Michaelis constant, whereas k,=k2 is the catalytic constant (Cornish-Bowden, 2004). The 

integration of (4.3) under initial condition pro (0)=0 for j=1,2, ..., 7 gives 


where Ep and So are the initial concentrations of plasmin and its substrate, Ky, = 


t i pied 4 Km In 


= (4.4) 
kyEy '  kpEo Sjop” 


The time t in (4.4) is a strictly increasing function of prod for any combination of Km and kp 


and therefore it has an inverse function oot = Pek, Kyrky/So,j-E:), which can be 
numerically estimated for all measured time points t; by a look-up table procedure and the 
results can be denoted as pred , 1=1,2,..., 60; j=1,2,..., 7. 

Because in the course of certain experiments the reaction rate declined faster than predicted by 


Model I, the more general scheme E +S a ES—2->EP — E+ P was also tested (Model 
4 3 

II), which accounts for the accumulation of the product and its complex with the enzyme. 

Assuming steady-state for both ES and EP complexes the differential rate-equation is 


mod mod 
dP ky -Ex9-(So,; - Pi") (4.5) 
mod mod 
dt Km-(1+K;-P;"")+So,; — P; 
where Km _ (ka thy) ty is the Michaelis constant, k ds NAS, is the catalytic constant, 
ky (ky +k3) Po ky +k 


ands K; -1 is the equilibrium association constant for the product. Although the 
3 

Michaelis constant and the catalytic constant derived for Model I and Model II have different 

algebraic form, their meaning within the context of the specific catalytic mechanism is 

identical; the Km is the substrate concentration at which the initial reaction rate is half of the 

maximal rate possible for given enzyme concentration, whereas kp has the properties of a 

first-order rate constant defining the capacity of the enzyme-substrate complex to form 


product (Cornish-Bowden, 2004). Integrating (4.5) under initial condition p” (0) =0 for 
j=1,2, ...,7 gives 


_1-K,,K; pred , Rar Saji) 5 So, 


t J mod 
k, Eo ky Ew Sj- È 


(4.6) 


The time t in (4.6) is a strictly increasing function of proi for any combination of Km, kp and 


1 


Ki and therefore it has an inverse function P” = P!""(t,K,,,k,,K;,S, E), which can be 
j j M’"p 0,j7°~t0 
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numerically estimated for all measured time points t; by a look-up table procedure and the 
results can be denoted as pred , 1=1,2,..., 60; j=1,2,..., 7. 
Because in certain cases the product inhibition could not model the progress curve of the 


reaction satisfactorily, the instability of the enzyme in the assay system was also considered 
according to the scheme suggested in (Duggleby, 1995; Duggleby, 2001) (Model III): 


k 
E+Se = ES— EPEE +P 


-1 -3 


th th VJs 


E' E'S E'P 


in which E’ indicates the inactive form of the enzyme, Jı, J2 and J3 are the rate constants for 
inactivation of the respective forms of the enzyme (Jı and Jz are decay rate constants of the 
enzyme-substrate complex; J3is the decay rate constant of the enzyme product complex). In 
independent measurements with fatty acids we showed that the inactivation of free plasmin 
for the duration of the amidolytic assay was negligible (data not shown) and consequently 
the differential equation for the changes in enzyme concentration was derived only for the 
scheme with J, and J; (i.e. J:=0) yielding seven separately solvable systems of two ODEs: 


mod mod 
aP ky Ey (Soj = Pr?) 
dt Kull +KP?) +S, ,- Pin 
dE, j Jo-E, (So, ; 2 pren) /Km+ Ja-E; Kip” 
mod mod 
dt 1+K;P + (Sg j- Pl") / Ky 


for j=1,2,...,7 (4.7) 


The initial conditions of (4.7) are P" (0)=0, E, ;(0) = E,- In (4.7), kp and Km have the same 


meaning as in Model II. The integration of ODE systems (4.7) from time 0 to tso was done by 
quasi-constant step size implementation in terms of backward differences of the 
Klopfenstein-Shampine family of Numerical Differentiation Formulas of orders 1-5 and the 
initial steps were determined so that the solution would stay in its domain (0 < ppi <Sp j7 


OSE, s E, ) during the whole integration (Shampine et al., 2003). The values of the product 


concentrations at time points t; for Model III can be found from the first component of the 
solution: P= PP (t, Kysky/Ki,Jo/J3So,j/Es) for 1, 2, ..., 60 and for j=1,2,..., 7. 

Since Model I and Model II are special cases of Model III for given values of Jz J3 and Kj, then 
further discussion only refers to Model III. 

As in section 3, pee depends only on the kinetic parameters K,,,k Ki Jo,J37 eae and 


l 


pepas would only depend on the sample Do. Then the optimal parameters kpo, Kmo, 


K; o- J20, J3 ocan be found with 7? -minimization of 


2 
ie Pw ky Km: Ki Ja Js )- Pre" (Do) 
1 (kp Ku KiJ Do)= >) i (k, 7) j 
aa D.) 


(4.8) 
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ño =(ky0/Ku,o-Ki,0-J2,07]3,0) sarg, in, 2 (by KueKi Je lPa)] 2) 
The minimization (4.9) was performed using the Nelder-Mead simplex direct search method 
(Lagarias et al., 1998). Since the optimization assigns values of less than 10-12 (sec-!) to Jz, 
from now on only J3 shall be considered. 
The four-dimensional confidence region of kp Km, Ki, and Js is calculated by Monte Carlo 
simulation. The synthetic samples D> (q=1,2,..., Q) are formed so that for the i-th time point 
of the j-th process, the product concentrations are generated as four instances of a normally 
distributed random variable with mean value P’*""" and standard deviation praet, 
Then 


=5 _(15 KS Ss 7S \_ F 2 s 
fq = (aKa Kip ia) ars|, min z (ky Ku KiJs:D; ) (10) 
Let’s find the root confidence region of the parameters. That requires flipping the 
parameters as in (3.17). However, the resulting values may have no physical meaning (could 
be negative) unlike the case in section 3, because here we operate with real measurements, 
and there is no appropriate initial guess for the optimization. This situation is a rule rather 
than an exception in biochemical analysis, where parameters are supposed to be strictly 
positive. Therefore, one possible solution to find the root confidence region in such cases is 
to use a modification of the classical Monte Carlo procedure, called multiplicative Monte 
Carlo. The main assumption here, in accordance with section 1, is that the ratio between the 
true parameter value and the optimal parameter value derived from the true data sample 
Struc has the same distribution as the ratio between the optimal parameter value derived 
a 

from the true data sample, and the optimal synthetic parameter value derived from the 
fy 
ay 
over the logarithms of the estimated parameters. Then, the flipped parameters are derived 
as follows: 


synthetic data sample The assumption is equivalent to performing classical Bootstrap 


gsf — 240 2 2.k, 0 2.Km, o 2Kio 2-J3,0 =( Sf KS KSS Bd) (4.11) 
q = S$ '’ 7S A 7S ‘' 7S pq” ~Ma ig 53,9 
a kpa Kma Kiq Ba 


q 
Then the root confidence region is derived in the same way as in section 3. Let 


2S _ ,2(,S.f vSf Sf Sf i 
=x (ee Kia Ki i J33 P Do) be the discrepancy measure (4.8), calculated over the 


original sample with the flipped synthetic parameters. The discrepancy measures Pel are 
i i 2S < 2,8 2,5 ma i 
renumerated in ascending order so that AP SHË SS The inside and outside 


simulated points are identified as in section 3. Again, the area of the inside simulated points 
determines the root confidence region. The best point estimate of the parameters is 
calculated as trimmed geometrical mean of äp using just the inside simulated points. Since 


it is impossible to plot the 4D confidence regions, they are visualized in pairs (e.g. Fig. 4.1 
shows the confidence region of Km and kp, and of K; and Js). The kinetic parameters of 
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plasmin in the presence of the three fatty acids at different concentrations are given in Table 
4.1. It also gives the 2D confidence intervals (2D) and best estimates, whereas the multi- 
dimensional confidence regions are given on Fig. 4.2. 


Concentration of Ku (uM) ky (sect) Ki ( u M2) Js (sec) 
added fatty acid 

(uM) BE CI BE CI BE CI BE CI 
None 5.89 5.43-6.38 5.81 | 5.70-5.93 NA NA NA NA 
Oleate 

10 12.58 | 11.68-13.41 | 4.54 | 4.48-4.60 NA NA NA NA 

25 20.09 | 18.76-21.26 | 3.65 | 3.56-3.73 NA NA NA NA 

45 27.49 | 26.43-28.57 | 2.63 | 2.58-2.68 NA NA NA NA 

65 131.09 |115.33-146.13| 2.75 | 2.57-2.95 NA NA NA NA 
Arachidonate 

10 23.71 | 22.91-24.58 | 6.10 | 6.06-6.15 NA NA NA NA 

25 42.65 | 38.28-47.66 | 3.57 | 3.43-3.71 NA NA NA NA 

45 57.51 | 55.24-59.60 | 3.68 | 3.62-3.73 NA NA NA NA 

65 59.85 | 56.76-62.73 | 2.40 | 2.35-2.44 NA NA NA NA 
Stearate 

65 8.17 7.35-9.10 7.03 | 6.93-7.12 | 0.025 | 0.011-0.053 | 0.026 | 0.018-0.038 
115 11.33 | 9.42-13.41 7.48 | 7.37-7.61 | 0.012 | 0.008-0.402 | 0.056 | 0.001-0.158 
175 23.37 | 20.98-26.88 | 12.39 | 11.92-12.73 | 0.007 | 0.005-0.009 | 0.062 | 0.052-0.069 
230 72.96 | 69.86-75.86 | 14.77 | 13.94-23.85 | 0.002 | 0.001-0.003 | 0.074 | 0.001-0.356 


Table 4.1. Kinetic parameters of plasmin in the presence of fatty acids. Numerical values of 
the best estimates (BE) and their 95% confidence intervals (CI). 


The described study implemented a novel numerical procedure based on Monte Carlo to 
give a quantitative characterization of the modulation of plasmin activity by the presence of 
three fatty acids. All three fatty acids caused a 10-20-fold increase in the Michaelis constant 
of plasmin. Based on the ratio of the catalytic and Michaelis constants, all three fatty acids 
acted as inhibitors of plasmin with various degrees of potency - oleate and arachidonate can 
be defined as mixed-type inhibitors of plasmin, whereas stearate has a rather unusual effect; 
the increase in the Michaelis constant is coupled to higher values of the catalytic constant. 
At saturating concentrations of the substrate this effect is seen as apparent activation of 
plasmin in the amidolytic assay. Our findings illustrate the general possibility for a 
modulator to change the kinetic parameters of an enzyme in an independent and 
controversial manner, so that the overall catalytic outcome may vary with the concentration 
of the substrate. In physiological context the reported results indicate that acting as mixed- 
type inhibitors, unsaturated fatty acids stabilize fibrin against plasmin, whereas through its 
discordant effects on the catalytic and Michaelis constants stearate may destabilize clots in 
the process of their formation. 
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Fig. 4.1. Confidence regions of Ky and k, (in the first plot) and of K; and J; (in the second plot) 
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Fig. 4.2. Kinetic parameters of plasmin in the presence of fatty acids (oleate OA, 
arachidonate AA, and stearate SA), and their multi-dimensional confidence regions 


The investigated biochemical problem was a suitable setup to demonstrate the 
multiplicative Monte Carlo, which provides a reasonable physical meaning of resulting 
parameters. Its advantages over the classical methods in tasks, where strictly positive 
parameter values are required, makes it a necessary tool in most biological and biochemical 
parameter identification studies. 


5. Conclusions 


The application of Monte Carlo and Bootstrap techniques in a multi-dimensional setup is 
principally similar to the one-dimensional case. However, instead of CI, we need to find 
confidence regions that contain certain percentage of the entire probability distribution. As 
demonstrated, only root and percentile confidence regions are of interest in the multi- 
dimensional case. The use of either of them stems from the two different approaches to 
probabilities, so researchers should first clarify their viewpoint to probabilities in general 
before choosing the type of confidence regions to exploit. Working with root confidence 
intervals (regions) requires parameter flipping, which in some cases may generate results 
with an inappropriate sign. This is particularly true for biochemical (biological) processes, 
where parameters are supposed to be positive most of the cases. Therefore we presented 
the multiplicative Monte Carlo as a procedure that ensures the physical meaning of 
parameters. 

Since Monte Carlo and Bootstrap are computer intensive methods, all calculation and 
visualization procedures, discussed and demonstrated here were executed with MATLAB 
R2009a. Whenever we used root confidence intervals, we also provided a best point 
estimate, which as a rule is better than the sample estimate. Generally, the mean of the 
flipped parameters can serve for a best estimate, but having in mind the sensitivity to errors 
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(extreme values, outliers in the sample, etc.) it is recommended to use trimmed means. 
Regardless of the way to calculate this best estimate though, it will always be unbiased 
unlike the sample estimate. 

The recent expansion of technological tools in biomedical research poses the requirement for 
appropriate modeling of the processes under investigation. The described examples from 
our work underscore the applicability of Monte Carlo simulations in biochemical models of 
variable complexity providing a robust tool to estimate the reliability of estimated 
parameters. 
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